Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(cv_deploy): Expose list of targeted inactive devices if Workspace submission failed due to ResponseCode.INACTIVE_DEVICES_EXIST #4990

Open
wants to merge 8 commits into
base: devel
Choose a base branch
from

Conversation

alexeygorbunov
Copy link
Contributor

@alexeygorbunov alexeygorbunov commented Feb 7, 2025

Change Summary

Verify streaming status of all devices targeted by cv_deploy and raise (and update Errors) or update Warnings (depending on the force parameter of the Workspace) if any inactive device caused Workspace submission failure.

Related Issue(s)

Fixes #4038

Component(s) name

arista.avd.cv_deploy

Proposed changes

  • cv_client.get_inventory_devices in verify_devices_in_cloudvision_inventory already fetches actual state of all targeted devices including their streaming status. Add additional attribute _streaming to describe streaming state of CVDevices.
  • If Workspace submission succeeded, was forced and inactive devices were present prior to initiating submission - log warning message and append it to result.warnings as well:
TASK [arista.avd.cv_deploy : Deploy device configurations and tags to CloudVision] ***
[WARNING]: [pyavd] - Inactive devices present: [CVDevice(hostname='avd-ci-
core1', serial_number='20C292B489214DF32F9506C242A722FF',
system_mac_address='50:00:00:a1:33:1a', _exists_on_cv=True, _streaming=False)]
[WARNING]: Inactive devices present: [CVDevice(hostname='avd-ci-core1',
serial_number='20C292B489214DF32F9506C242A722FF',
system_mac_address='50:00:00:a1:33:1a', _exists_on_cv=True, _streaming=False)]

        "errors": [],
        "failed": false,
        "warnings": [
            "Inactive devices present: [CVDevice(hostname='avd-ci-core1', serial_number='20C292B489214DF32F9506C242A722FF', system_mac_address='50:00:00:a1:33:1a', _exists_on_cv=True, _streaming=False)]"
        ],
        "workspace": {
            "force": true,
            "requested_state": "submitted",
            "state": "submitted"
        }
  • If Workspace submission failed, was not forced, submit_result.code == ResponseCode.INACTIVE_DEVICES_EXIST and inactive devices were present prior to initiating submission - log warning message and raise exception:
TASK [arista.avd.cv_deploy : Deploy device configurations and tags to CloudVision] ***
[WARNING]: [pyavd] - Failed to submit CloudVision Workspace due to the presence
of inactive devices. Use force to override. Inactive devices:
[CVDevice(hostname='avd-ci-core1',
serial_number='20C292B489214DF32F9506C242A722FF',
system_mac_address='50:00:00:a1:33:1a', _exists_on_cv=True, _streaming=False)].

        "errors": [
            "Failed to submit CloudVision Workspace due to the presence of inactive devices. Use force to override. Inactive devices: [CVDevice(hostname='avd-ci-core1', serial_number='20C292B489214DF32F9506C242A722FF', system_mac_address='50:00:00:a1:33:1a', _exists_on_cv=True, _streaming=False)]."
        ],
        "failed": true,
        "warnings": [],
        "workspace": {
            "force": false,
            "requested_state": "submitted",
            "state": "submit failed"
        }
  • If Workspace submission failed, was not forced, submit_result.code == ResponseCode.INACTIVE_DEVICES_EXIST and but all devices were active prior to initiating submission - log warning message and raise exception without specifying actual devices (they might have change their streaming status between device verification and Workspace submission phases):
                msg = (
                    "Failed to submit CloudVision Workspace due to the presence of inactive devices. "
                    "Use force to override. Exact list of inactive devices is unknown."
                )
                LOGGER.warning(msg)
                raise CVWorkspaceSubmitFailedInactiveDevices(msg)
  • In any other failure cases - raise generic submission exception

How to test

  • Initiate cv_deploy molecule tests targeting CI environment
  • Check outputs of the cv_deploy with cv_submit_workspace_force = true test
  • Check outputs of the cv_deploy with cv_submit_workspace_force = false test

Checklist

User Checklist

  • N/A

Repository Checklist

  • My code has been rebased from devel before I start
  • I have read the CONTRIBUTING document.
  • My change requires a change to the documentation and documentation have been updated accordingly.
  • I have updated molecule CI testing accordingly. (check the box if not applicable)

@alexeygorbunov alexeygorbunov requested review from a team as code owners February 7, 2025 04:03
Copy link

github-actions bot commented Feb 7, 2025

Review docs on Read the Docs

To test this pull request:

# Create virtual environment for this testing below the current directory
python -m venv test-avd-pr-4990
# Activate the virtual environment
source test-avd-pr-4990/bin/activate
# Install all requirements including PyAVD
pip install "pyavd[ansible] @ git+https://github.com/alexeygorbunov/avd.git@issue_4038_improve_inact_devs_err#subdirectory=python-avd" --force
# Point Ansible collections path to the Python virtual environment
export ANSIBLE_COLLECTIONS_PATH=$VIRTUAL_ENV/ansible_collections
# Install Ansible collection
ansible-galaxy collection install git+https://github.com/alexeygorbunov/avd.git#/ansible_collections/arista/avd/,issue_4038_improve_inact_devs_err --force
# Optional: Install AVD examples
cd test-avd-pr-4990
ansible-playbook arista.avd.install_examples

@github-actions github-actions bot added the state: CI Updated CI scenario have been updated in the PR label Feb 7, 2025
@alexeygorbunov alexeygorbunov marked this pull request as draft February 7, 2025 04:10
@ClausHolbechArista
Copy link
Contributor

We should only warn if requested state for the workspace will not need to submit it. It can be useful to do builds and preview the changes even if devices are not streaming.

@alexeygorbunov
Copy link
Contributor Author

alexeygorbunov commented Feb 7, 2025

We should only warn if requested state for the workspace will not need to submit it. It can be useful to do builds and preview the changes even if devices are not streaming.

fixed

    existing_devices = await verify_devices_in_cloudvision_inventory(
        ....
        raise_if_inactive=((not workspace.force) and workspace.requested_state == "submitted"),

@alexeygorbunov alexeygorbunov marked this pull request as ready for review February 7, 2025 19:46
python-avd/pyavd/_cv/client/exceptions.py Outdated Show resolved Hide resolved
@@ -99,6 +120,23 @@ async def verify_devices_in_cloudvision_inventory(
device.serial_number = found_device_dict_by_hostname[device.hostname].key.device_id
device.system_mac_address = found_device_dict_by_hostname[device.hostname].system_mac_address

if verify_streaming:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If feels like we are building more lists and iterating over everything again and again.

Could be just set the streaming flag during the inspection above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to extend CVDevice class to add streaming attribute?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be my suggestion. But if we decide to just submit no matter what, we don't even need to record this, but just set the force flag or not as given.

Comment on lines 57 to 59
verify_streaming: bool = False,
raise_if_inactive: bool = False,
inactive_exception_class: type = CVInactiveDevices,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need all these extra options on this function. If you just gather the streaming state in here, you could move the invocation of the verify to the workspace submission logic where everything would make more sense. It would also mean we would not raise early (before even trying to build the workspace).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, after moving this streaming check over to submission phase - should we still raise (and avoid initiating submission) is we discover inactive devices targeted to configuration update when WS submission is not forced? Or should we just let submission to be initiated and let if fail (although we know in advance that it will fail)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I said let's not try, but what if the device started streaming meanwhile. I think we should just submit and catch the error and provide a nice message suggesting the force option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the challenge here is that in contrast to the failing build (where we can fetch details of actual ws_id + build_id pair showing what exactly failed for which device) - failure of the WS submission does not provide these details (subscribe to WorkspaceServiceStub just returns iterator of Workspace objects and they have no details about what exactly happened/failed, just an overall code and status, not telling exactly which devices were not streaming).
If we assume that streaming status can change from inactive to active between "verifying devices" step and step to submit WS - then it's probably fair to say that streaming status can change between failing WS submission and re-fetching device inventory (streaming) (if we decide to react to the failed submission postfactum). Maybe a possible way to accurately identify actual streaming state of the devices during WS submission is by subscribing to DeviceServiceStub somehow for the whole duration of the submission phase and then correlating potential changes with exact WS submission timestamp. But not sure if this is not an overkill.
Maybe we can then indeed just fetch streaming status at the initial "verification" phase and then if WS submission fails - just assume that streaming didn't change between verification and submission phases

@alexeygorbunov alexeygorbunov marked this pull request as draft February 12, 2025 20:45
@alexeygorbunov alexeygorbunov changed the title Feat(cv_deploy): Verify streaming status and raise before trying to submit Workspace Feat(cv_deploy): Expose list of targeted inactive devices if Workspace submission failed due to ResponseCode.INACTIVE_DEVICES_EXIST Feb 13, 2025
@alexeygorbunov alexeygorbunov marked this pull request as ready for review February 14, 2025 01:26
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: CI Updated CI scenario have been updated in the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error message on cv_deploy when devices are not streaming
2 participants